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Abstract 

It is widely believed that the fermion determinant cannot be treated 
in global acceptance-rejection steps of gauge link configurations that differ 
in a large fraction of the links. However, for exact factorizations of the 
determinant that separate the ultraviolet from the infrared modes of the 
Dirac operator it is known that the latter show less variation under changes 
of the gauge field compared to the former. Using a factorization based on 
recursive domain decomposition allows for a hierarchical algorithm that 
starts with pure gauge updates of the links within the domains and ends 
after a number of filters with a global acceptance-rejection step. Ratios of 
determinants have to be treated stochastically and we construct techniques 
to reduce the noise. We find that the global acceptance rate is high on 
moderate lattice sizes and demonstrate the effectiveness of the hierarchical 
filter. 
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1 Introduction 



The state of the art simulation algorithm for lattice QCD is the Hybrid Monte 
Carlo (HMC) [JJ|2]. As the continuum limit is approached, when the lattice 
spacing a goes to zero, the simulation cost for a given observable scales typically 
as a~^ +z \ The dynamical critical exponent z depends on the observable and is 
responsible for the critical slowing down of the simulations. Recently in [3] it was 
shown that z(Q 2 ) = 5 for the topological charge Q (the scaling might even be 
exponential in 1/a cf. jl]). This is a common problem for all present algorithms 
for gauge theories and the reason has been traced back to the fact that simulations 
on periodic lattices get stuck in topological sectors [5]. In fact, on lattices with 
open boundary conditions z(Q 2 ) = 2 is found in [6]. Our original motivation was 
to look for an alternative algorithm which allows for larger steps in the space of 
gauge fields. 

In recent years new actions to simulate QCD on the lattice have been devel- 
oped, in particular, based on smearing of the gauge links in the Dirac operator. 
In the case of Wilson fermions the stability of the HMC algorithm is influenced 
by the fluctuations of the smallest eigenvalues of the Wilson-Dirac operator [7J . 
The results of [8] show evidence that smearing improves the stability. The HMC 
requires the computation of forces (i.e. derivatives of the Dirac operator with 
respect to the gauge links) and this can be very complicated or even impossi- 
ble, like when HYP smearing [5] is used. A number of solutions exist, like using 
stout [10], nHYP [9] or HEX [8j[TT] smearing or a differentiate approximation 
to the SU(3) projection for the smeared links [12], but flexibility in the choice of 
gauge and fermion actions is highly desirable and so the question arises, whether 
an alternative algorithm without force computations exists. 

In this article we study, motivated by a previous work in the Schwinger model 
[13], an algorithm based on global acceptance-rejection steps accounting for the 
fermion determinant in QCD with Ni = 2 quark flavors. The basic idea is to 
make a gauge proposal which is accepted with a probability that depends on 
the ratio of fermion determinants on the "new" and "old" gauge configurations. 
Such an algorithm has already been used in QCD simulations with HYP-smeared 
link staggered fermions [TH4T6] . with the fixed point action [17] and in [18]. The 
problem with this type of algorithms is their scaling with the lattice volume 
The cost of an exact determinant computation grows with V 3 and the acceptance 
to change a finite fraction of links decreases like exp {—V). 

In order to avoid the computation of exact determinants we use a stochas- 

^^Unless otherwise specified, in this article we use lattice units (i.e. we set a = 1) and V is 
the number of lattice points. 
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tic estimation. This estimation can naively introduce a noise which grows like 
exp(V). In order to tackle these problems we construct a hierarchical filter of 
acceptance-rejection steps which successively filters the large fluctuations of the 
gauge proposal [19|. Hierarchical acceptance-rejection steps based on approxi- 
mations of the determinant with increasing accuracy were introduced and tested 
in [20]. Here the filter relies on an exact factorization of the fermion determinant 
based on domain decomposition [21], which separates the short distance from the 
long distance scales of the lattice. A hierarchy of block acceptance-rejection steps 
was proposed in [22] but has never been tested. 

The article is organized as follows. In Section |2] we introduce the hierarchical 
filter of acceptance-rejection steps. Its construction based on domain decomposi- 
tion is detailed in Section [31 The techniques we use for the stochastic estimation 
of determinant ratios are presented in Section HJ In particular we introduce an 
interpolation of the gauge fields which also allows to compute the exact (i.e. with- 
out stochastic noise) acceptance. Results for the latter and the effectiveness of 
the filter are shown in Section Section O presents simulation results of 16 4 and 
32 x 16 3 lattices using a filter with three acceptance-rejection steps. A comparison 
with the HMC is made for observables like the plaquette or the topological charge. 
In the conclusions Section [7] we also discuss the scaling with the volume. Appendix 
lAl contains the proof of detailed balance, Appendix [B] describes the technique of 
relative gauge fixing used for the stochastic estimation and Appendix O explains 
how the acceptance is enhanced by the use of additional parameters. 

2 Hierarchy of acceptance steps 

Let P(s) be the desired distribution of the states s of a system. Suppose a process 
that proposes a new state s' with transition probability Tq(s — >■ s') and fulfills 
detailed balance with respect to Pq(s). A process with fixed point distribution 
P(s) is then obtained by the combination of such a proposal with a subsequent 
Metropolis acceptance-rejection step [23] 



This hierarchy of a proposal step and an acceptance-rejection step can easily be 
generalized to an arbitrary number of acceptance-rejection steps. The result of the 
first acceptance-rejection step 1) is then interpreted as the proposal for a second 
acceptance-rejection step 2) and so on. If the target distribution P(s) factorizes 



0) Propose s' according to T (s — > s') 



1) 
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into n + 1 parts 

P{s) = P {s) Pi(s) P 2 {s) . . . P n (s) , (2.2) 
the resulting hierarchical acceptance-rejection steps take the form 

0) Propose s' according to T (s — > s') 



2) P™(s-+ a ') = min{l,^} 
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(2.3) 



In the context of lattice QCD it is plausible to assume Pi(s) oc exp(— Si(s)) with 
real actions Si and thus 

P »( g ') - P -Ai( s , s ') c 2 4^ 

Pi{*)~ ' 1 j 

where Aj(s, s') = Si(s') — Si(s). The average acceptance rate in step i) is defined 

by 

( P &),S = E P ( S ) E P °( S ') p i( s ') • --Pi-^') min{l,e- A ^} . (2.5) 

s s' 

It can be computed assuming a Gaussian distribution for Aj(s, s') with variance 
and the result is [T3] (see also [24J) 

(^>.,, = erfc ■ (2.6) 

The acceptance rates might be enhanced by parameterizing and tuning the fac- 
torization ( 12. 21) . see Appendix ICl 

Our goal is to simulate QCD with Nf — 2 mass-degenerate fermions. After 
integration over the Grassmann fermion fields the states s are defined by the gauge 
field U and the target probability distribution is 

p(y) = |detp W )|V^ (27) 

z 

where S g is the gauge action, D is the lattice Dirac operator and Z is the partition 
function 



Z = J D[U]\detD(U)\ 2 e- s ^ u K (2.1 



The integration measure is D[U] = Yl x n dU(x, /x), where dll(x,fi) is the SU(3) 
Haar measure for the link U(x,[i). 

A simple two-step algorithm would consist of some update of the gauge link 
configuration U — > V, which fulfills detailed balance with respect to Pq(U) oc 
exp(— S g (U)), followed by an acceptance-rejection step with the fermion determi- 
nant ratio 

The proof of detailed balance can be found in Section IA.1I 

If the proposal changes only one link and the Dirac operator D is ultra-local 
it is easy to show that the acceptance-rejection step requires only few inversions^]. 
An ergodic algorithm is then obtained by sweeps through the lattice. Thus the 
cost of such an algorithm would scale with the lattice volume V at least like 
V 2 [25] and it requires 0(V) inversions per sweep. 

If, on the other hand, a finite fraction oc V of the links is updated for the 
proposal, the acceptance rate decreases exponentially with the volume. In order 
to see this we write the distribution Pi as Pi(U) oc exp(ln(det D^D)). The action 
difference Ai(£7, U') = ln(det D{U'YD{U')) - ln(det D\U)D(U)) can be written 
as Ai = — £\ ln(Aj) in terms of the eigenvalues A of the operator M'M with 

M = D(U , )~ 1 D(U). (2.10) 

If we assume a Gaussian distribution^ (after averaging over the gauge ensemble U 
and the proposals U') for the logarithms of the eigenvalues \ = ln(Aj) with mean 
zero and variance erf, we can approximate 

(2.11) 

where Ni is the number of eigenvalues A ^ 1. Typically N x oc V and this implies 
that Y,\ is proportional to the volume V. The complementary error function 
in the formula for the acceptance ( 12 .6p has the asymptotic expansion erfc(x) ~ 
exp(— x 2 )/(x-\/7r)(l — l/(2x 2 ) + ■ • ■ ) for \x\ ^> 1 which shows the exponential 
decrease with the volume. 

From the preceding discussion it is obvious that such two-step algorithms 
will not be efficient for large lattices. Indeed numerical experiments show that 
for lattices larger than ~ (0.2 fm) 4 (where all links are updated) the acceptance 
rate quickly becomes less than a percent. However, in the context of low mode 



2 For example, in the case of the Wilson-Dirac operator 12 inversions are needed. 
3 We verified numerically that this assumption is valid to a good approximation. 
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reweighting the fluctuations of the determinant of D{ ow D\ ow , where D\ ow is a re- 
striction of D to its low modes, are found to depend only mildly on the volume [26] . 
The explanation for this observation might be the fact that the width of the distri- 
bution of the small eigenvalues of y/ D^D decrease like 1/V [26] (the fluctuations 
of the eigenvalue gap go instead like 1/yV [7]). Thus, given a factorization of the 
determinant that separates low (infrared IR) and high (ultraviolet UV) modes 

det(D) = det(Duv) • ■ ■ detpm) , (2.12) 

a hierarchy of acceptance steps can be constructed, where the large fluctuations 
of the UV modes go through a set of filters (acceptance-rejection steps) which are 
more and more dominated by the IR modes: 

0) P UV short distance local cheap 

n) P n IR long distance global expensive 

This hierarchy of modes may induce also a hierarchy of costs since it is the low 
modes that cause the most cost in lattice QCD. Furthermore the factorization 
should be exact and the terms simple to compute. Factorizations that realize 
these conditions are already used to speed-up the HMC algorithm, i.e., in the 
context of mass-preconditioning [27] and domain decomposition [21]. Only the 
latter also allows for a decoupling of local updates and will be discussed in the 
following. 



3 Domain decomposition 

Domain decomposition was introduced in lattice QCD in [22] and in [21] the result- 
ing factorization of the fermion determinant was used to separate short distance 
and long distance physics in the HMC algorithm. For definiteness we consider 
here the Wilson-Dirac operator D(U) [28], which may include the clover term 
needed for 0(a) improvement [29|l30]. But our algorithm is applicable to a more 
general class of Dirac operators, see below. 

Suppose a decomposition C of the lattice in non-overlapping blocks b G C 
(cf. Fig. [1] for a 2-dimensional visualization). The lattice sites are labeled such 
that the sites belonging to the first black block come first, then the second black 
block and after the last black block the first white block and so on. The Dirac 
operator can then be written as 

jj _ ( -Dbb -Dbw 
V ^wb D ww 
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Figure 1: Block decomposition of a 2- dimensional lattice. The blocks are coloured 
like a checker board. Picture taken from [21] . 

where (-D ww ) is a block-diagonal matrix with the black (white) block Dirac op- 
erators Db on the diagonal. The block Dirac operators Db fulfill Dirichlet boundary 
conditions and therefore are dominated by short distance physics (if the blocks are 
small enough). The matrices and _D w b contain the block interaction terms. 
The form ( 13. ip induces a factorization of the determinant 

det(D) = J] det(D ft ) det(D) , D = 1 - D^D hw D^D wh , (3.2) 

where D is the Schur complement of the decomposition ( 13. ip and contains block 
interactions, i.e. the long distance physics. A natural separation scale is given 
by the inverse block size 1 /Lb. In the context of the domain decompositioned 
HMC the average force associated with the Schur complement is an order of 
magnitude smaller than the force associated with the block Dirac operators [2T] . 
This indicates that the fluctuations of the determinant of the Schur complement 
are smaller than that of the block determinants. Furthermore the factorization 
( 13. 2 p can be iterated using a recursive domain decomposition 

det(A.) = \\ det(A/) det(D b ) . (3.3) 
b'ec b 

We note that the Schur complement Db fulfills Dirichlet boundary conditions. 
We have implemented the recursive domain decomposition in the freely available 
software package DD-HMC by M. Liischer |31] . In the case of one level of recursion 
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the hierarchy of acceptance-rejection steps is given by 
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(3.4) 



At the beginning the set of links to be updated, the so called active links, is chosen 
such that the acceptance-rejection steps for the smallest blocks, b', at stage 1) in 
Eq. (13. 4p decouple and can therefore be processed in parallel. In the case of Wilson 
fermions with or without clover term the active links are the links that have at 
most one endpoint on the boundary of a block (white points in Fig. [T]). In this case 
the block acceptance steps also decouple if the links in the Wilson-Dirac operator 
(but not in the clover term) are replaced by one level of HYP smearing [32] . After 
the last and global acceptance-rejection step the gauge field is translated by a 
random vector, see Appendix C of [21J. 

If the smallest blocks, b', at stage 1) in Eq. (13 ,4p consist of no more than 
~ 6 4 lattice points, the determinant ratios can be efficiently computed exactly 
by LU-decomposition [33]. If the smallest blocks are larger, we compute their 
determinants by a factorization like in Eq. (I3.3p . The Schur complements at the 
stages 2) and 3) in Eq. (13 .4p are usually too large for their determinant ratios 
to be computed exactly and have to be treated stochasticalljfl The stochastic 
estimation of determinant ratios is the topic of the next section. Following this 
discussion we give to our algorithm the name of Partially Stochastic Multi-Step 
(PSMS) algorithm. 

4 Stochastic techniques for determinant ratios 

Since the numerical cost for the computation of exact determinants grows with 
the cube of the size of the matrix, determinants of Dirac operators for lattices 
larger than 6 4 have to be estimated stochastically. In particular for our problem 
we have to estimate ratios of determinants of Schur complements, which arise 
from a domain decomposition and appear in the acceptance-rejection steps of 
Eq. (13 ,4p . In Appendix |A] we show that such stochastic acceptance- rejection steps 

4 The same applies to Schur complements arising from a factorization of the smallest blocks, 
if that is needed. 
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fulfill detailed balance. In this section we describe in detail the techniques we use 
to reduce the associated stochastic noise. 

In Section 14.11 we discuss the stochastic noise introduced when the determi- 
nant ratio in Eq. ( 12. 9 j) (for generic Dirac operators D) is evaluated stochastically. 
The stochastic noise depends on the spectrum of generalized eigenvalues of the op- 
erators forming the ratio [13] . In order to reduce it we apply techniques described 
in Section 14.21 and Section 14.31 In Section 14.21 we discuss a relative gauge fixing 
of the gauge field U and U'. This gauge fixing is applied for the construction of 
a gauge field interpolation, a new method which we present in Section 14.31 The 
gauge fields are linearly interpolated and this induces a factorization in terms of 
ratios of operators which can be made arbitrarily close as the number of inter- 
polation steps increases. In particular, there exists the limit in which the exact 
ratio is obtained. In Section 14.41 the properties of the Schur complement are re- 
viewed. In this particular case the noise vector can be restricted to a subspace of 
the boundary points of the blocks. In Section l4"75l we support the introduction of 
these techniques by numerical results. 

4.1 Stochastic estimation of determinant ratios 

We replace the determinants of ratios of Dirac operators in Eq. (12. 9p by stochastic 
estimators 

min{l,det(M t M)- 1 } — ► min jl, e -\ M ^ + ^ 2 J , (4.1) 

where the ratio operator M is defined in Eq. (I2.10p . In Eq. ( 14. ip r\ is a complex 
Gaussian noise vector that is updated before each acceptance-rejection step and 
\r)\ 2 is its norm squared, see Section TA. 2 1 The average over r\ of a function f{rf) is 
defined by 

</fa)>„ = / D[ V ]e-M 2 f( V ). (4.2) 

The measure D[rj\ is normalized such that j D[rj] exp(— \i]\ 2 ) = 1. The algorithm 
satisfies detailed balance (the proof is given in Section IA.2[) and yields an accep- 
tance rate that is bounded from above by the exact acceptance in Eq. (12. 9p [15] . 
There are other possible choices for the distribution of rj than a Gaussian distribu- 
tion. But because of the central limit theorem these other choices are equivalent 
to the Gaussian distribution in the large volume limit. 

The stochastic noise introduced in the acceptance-rejection step by Eq. (14. ip 
has the effect of replacing in Eq. (12. 6p 

£ 2 — ^a 2 = S 2 + (a stoch ) 2 , (4.3) 
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where 

((J stoch )2 = (A 2 )^ - ((A)^) 2 (4.4) 

with A = | Mr/ 1 2 — |r/| 2 . The average (•) Lrc// is taken over the gauge ensemble 
Z7, the proposals U' and the noise vectors rj. For given [/ and U' Eq. ( 14.4p can 
be computed by performing the integrations over rj in the basis of orthonormal 
eigenvectors of M^M with eigenvalueqj A*., cf. [13]. The result is 

(a- ch ) 2 = /^(A fc -l) 2 \ . (4.5) 

\ k I u,U' 

The eigenvalues A = 1 do not contribute to the variance. If we denote by hi the 
full width at half maximum (FWHM) of the distribution of the eigenvalues A& 
and by Ni the number of eigenvalues which are not one, we can approximate 

(a stoch ) 2 « Nih\. (4.6) 

It becomes clear that the smaller Ni and hi are, the larger the stochastic accep- 
tance will be. Furthermore in [31] it was noted that the spectrum of M^M has to 
fulfill the condition A > 0.5, because otherwise the variance of the quantity under 
the minimum function in Eq. (14. Xp is not defined. 



4.2 Relative gauge fixing 

In [13] (see also [17]) it was noticed that relative gauge fixing of the configura- 
tion U and U' reduces the stochastic noise in Eq. ( |4.4|) . Under a gauge trans- 
formation g{x) G SU(3), the gauge links transform as U(x,]jl) —> U 9 (x,fi) = 
g(x)U(x, fi)g(x + fi)^ 1 and the Dirac operator as 

D(U 9 ) xy = g(x)D(U) xy g(y)'~ 1 (no sum over x and y) , (4.7) 

where we suppress the spin indices. Further we define a scalar product of two 
gauge fields as 

(U,U') = -L^ReTr {l - U(x, ^U'(x, M )} . (4.8) 

X,fl 

Relative gauge fixing is defined through the minimization 
min (U 91 , U' 92 ) = min -\- V Re Tr { 

91,92 91,92 12V I 

X,fJ, 

1 - g^x + fi)U(x, gi(xy 1 g 2 (x)U' (x, [i)g 2 (x + fry 1 } . (4.9) 

5 The eigenvalues A of M^M are equivalent to the generalized eigenvalues of the problem 
D(U)D(Uy X = *D(U')D(Uy X - 
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We determine g\ and gi before the acceptance- rejection step Eq. (14.11) . where we 
use 

M = D(JJ m Y x D(XJ^). (4.10) 

Relative gauge fixing does not change the exact acceptance rates in Eq. (13.41) 
but in general improves the stochastic acceptance rate in Eq. (I4.ip . In order to 
show detailed balance in the latter case, consider the reverse transition U' — > U, 
for which the minimization is min^^ (U' 91 , U 92 ). As one can immediately see 
by taking the complex conjugate of Eq. (I4.9f) the result is given by gi = #2 and 
g2 = g\. This implies for the reverse transition 

M -> M = D(U~ 92 y l D(U'~ 91 ) = M" 1 , (4.11) 

which is precisely the property needed to prove detailed balance [13]. 

In the above procedure, the choice of g\ and #2 is not unique. In fact one can 
transform gi — > g\h and g<i — > g^h by some other gauge transformation h(x) and 
the minimization condition Eq. (14 .9p is unchanged. Instead we choose^ 

9i = 9? = 9- (4-12) 

The numerical procedure for the minimization Eq. (I4.9P using Eq. ( I4.12p is de- 
scribed in Appendix [Bj 

In the proposal U — > U' we only change active links in the blocks and we 
restrict the gauge transformations g in Eq. (I4.12p to the black points in Fig. [TJ 
One reason for this is that the critical slowing down of such a local (i.e. restricted 
to the blocks) minimization is reduced compared to a global minimization over 
the entire lattice. 

4.3 Gauge field interpolation 

In order to ensure A > 0.5 and bring the spectrum of M^M closer to one, one 
could employ the method of determinant breakup introduced in [20(|34"] . It uses the 
factorization det(M^M) = [det((M^M) 1 / 7V )] 7V and in the stochastic acceptance- 
rejection step Eq. (14. ip each factor is then replaced by a stochastic estimator with 
an independent noise vector. The effect on the spectrum of M^M is to replace 
A — > X 1 ^ . The gauge field interpolation which we propose in this article has a 
similar effect but avoids the computation of 1/iVth roots of M^M. 

We introduce a sequence of intermediate fields Ui, i = 0, . . . , N which starts 
from the gauge field U = U 9 and ends with the gauge field Un = U' 9 . g is the 

6 We thank Ulli Wolff for suggesting this choice. 
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gauge transformation in Eq. (14. 12ft . The determinant of M^M can be factorized 
like 

N-l 

det (M f M) = Yl det(M}Mi) , (4.13) 

i=0 

where 

Mi = DiU^DiUi) . (4.14) 

The stochastic acceptance-rejection step in Eq. (14. ip is done by drawing one in- 
dependent Gaussian noise vector £j for each factor 

min|l,e E S 1_|M ^ |2+l5i|2 J . (4.15) 

The cost is then one inversion for each factor. In order for the algorithm to fulfill 
detailed balance the intermediate gauge configurations have to be the same when 
doing the reverse change U' — » U. The proof of detailed is given in Section IA.31 
The simplest way to construct such an interpolation is 

N — i i -i 

U i (x,n) = ^ r U°(x, f i) + -U' 9 (x,fj,), z = 0,l,.-. ,N-1, (4.16) 

which interpolates linearly between Uq = U 9 and Un = U' 9 . The interpo- 
lation has no physical meaning, only numerical efficiency counts. The inter- 
mediate fields are not SU(3) matrices, in the Dirac operator we use Uj (and 
not Uf ) in order to preserve the 75 Hermiticity of the Wilson-Dirac operator. 
Since \ \Ui — Ui+x\\ oc 1/AT, V? < AT, we expect the eigenvalues of M\Mi to be 
= 1 + 0(/ii/iV) and so the FWHM of their distributiorfl can be approximated 
by /itv ~ hi/N in terms of the FWHM h\ of the eigenvalue distribution of M^M. 
The stochastic noise in the acceptance-rejection step is reduced to 

(ap ch ) 2 ^NN 1 h 2 N ^N 1 ^ (4.17) 

as compared to Eq. ( 14. 6p . An important feature of this method is the limit 
— > 00, for which cr^ och — > and we recover the exact acceptance, cf. Eq. ( 14. 3p . 

4.4 Schur Complement 

The Schur complement in Eq. ( 13. 2ft is D = 1 — Q with Q = D^D^D^. Let 
us denote by P the orthonormal projector to the space of the white points in the 

7 In the case of the full Dirac operator, we find numerically that the smallest (largest) 
eigenvalue change with N as A^ in ~ exp{— b/N} (Am ax ~ exp{6'/iV}) for positive constants b 
(&'). This is the same behavior one obtains using the determinant breakup in 1/iVth roots. 
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black blocks in Fig. [TJ For the points which have only one nearest neighbor on a 
different block, P projects to only two of the four spin components. The explicit 
definition of P can be found in Appendix B of [21]. It does not depend on the 
gauge field and it satisfies the properties D wb P = D wh and P 2 = P which imply 

det(l-Q) = det(l -PQ). (4.18) 

This means that one can use 1 — PQ instead of D in Eq. ( 14. ip and therefore the 
noise r\ is defined only on the space invariant under P. We also need to apply the 
inverse of the operator 1 — PQ which is [21] 



(1-PQ)- 1 = l-PD- 1 D wh . (4.19) 

Here D wh is meant to act on the total space of points (by padding with zeros). For 
a global lattice of sizes in directions fj, — 0, 1, 2, 3 and a domain decomposition 
into blocks of sizes 1^, the dimension of the space invariant under P is 

dim(p) = 6 f[ ^(^-^r 1 - 4 E^- 1 )) • ( 4 - 2 °) 

^=0 4 " V=o Lv u=0 J 

For the number N\ in Eq. ( 14. 6 p we have N\ < dim(P). On a lattice with the 
same number of points L in all directions, if we choose = L/2 (16 blocks) then 
dim(P) « 48L 3 , to be compared to V = 12L 4 if we were to consider the full Dirac 
operator. 

The reduction of N\ alone turns out not to be sufficient to make stochastic 
acceptance- rejection steps like in Eq. (14. ip . with the Schur complement ratio, 
efficient. Moreover the relative gauge fixing described in Section 14.21 does not 
directly help in reducing the stochastic noise in this case. The reason is that the 
restriction of the gauge transformations to the black points in Fig. [T] leaves the 
Schur complement invariant. This is why the gauge field interpolation is necessary 
to further reduce the noise. As we show in the next section relative gauge fixing 
has an impact on the interpolation. 



4.5 Numerical results 

The interpolated fields U{ in Eq. (14.161) change if we apply first a relative gauge 
fixing of U and U', which minimizes their distance in the sense of Eq. (14.81) . In 
Fig. [2] we show the behavior of the plaquette of the interpolated fields C/j. In the 
computation of the plaquette, if U denotes a link then the link in reversed direction 
is defined by U* (and not by U~ l ). Without relative gauge fixing the intermediate 
configurations look like if they were thermalized configurations of a smaller /3. 
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Figure 2: The plaquette value for the interpolated fields Ui defined in Eq. (|4.16p is 
shown as a function of i. The start and end fields Uq and Un [N = 24, 40 pairs) are 
8 gauge configurations taken from simulations of plain Wilson fermions at ft = 5.6, 
k = 0.15825, where active links in 4 4 blocks are changed. We compare plaquette values 
with (red circles) and without (black pluses) relative gauge fixing. 

The links become rougher. This is understandable if one imagines that the gauge 
configurations U and U' lay somewhere randomly in the configuration space. So 
the path will not go over configurations which are similar to the "thermalized" 
ones. With relative gauge fixing, the path of the interpolated links yields plaquette 
values which are approximately constant, cf. Fig. [2] which also shows the two- 
sigma band of a thermalized ensemble. In Fig. [3] we show the spectra of the Schur 
complement ratios M^Mj in Eq. (I4.14p . Since relative gauge fixing is applied to 
all links the spectrum is narrower and the requirement A > 0.5 can be fulfilled for 
a relatively low value of interpolation steps N. As expected from the behavior of 
the plaquette the width of the spectrum does not change significantly along the 
interpolation. 

There are many possible ways to define alternative interpolations replacing 
Eq. (I4.16p . For example we could normalize the links by substituting Ui(x,fi) — > 
Ui(x,fi) det(Ui(x, /x)) -1 / 3 . It turns out that in this case the intermediate configu- 
rations look like if they were thermalized configurations at larger 0. As a conse- 
quence the spectrum can develop negative eigenvalues for small quark masses. We 
note that a mass-shift towards larger masses can be generated by multiplying the 
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A 

Figure 3: The spectrum of the Schur complement ratio M.\Mi defined in Eq. (|4.14p is 
shown as a function of i. The start and end fields Uq and Un (N = 12) are quenched 
4 4 lattices where all links are changed (and relative gauge fixed). The plain Wilson- 
Dirac operator with mass am® = 0.56 is used and the Schur complement is defined for 
a domain decomposition in 2 4 blocks. 

links with a common factor exp(a), a < 0, in the Dirac operator. Effectively such 
a factor (albeit with a different value for each link) can be easily incorporated 
into Eq. (14.161) by multiplying the links Ui(x,/i) with an appropriate power of 
their determinant det(Ui(x, fi)). If we do not normalize the links, a "mass shift" 
towards larger masses is automatically realized because det(Ui(x, fi)) < 1. But 
there is some room for improving the efficiency of the method. In the following 
we will use the simple interpolation given in Eq. (|4.16p . 

Finally we discuss what happens if the relative gauge fixing is extended to 
the entire lattice and is not restricted to the points inside the blocks. Links 
which are unchanged after the pure gauge update would change through a global 
minimization. This could introduce additional noise and indeed this is the case for 
the full Dirac operator but not for the Schur complement. The global minimization 
slightly improves the behavior of the interpolated fields in Eq. (I4.16P but this effect 
is not large and the danger to run into negative eigenvalues as discussed above 
increases. 
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Table 1: Optimal parameters for the 4-step PSMS algorithm (representative set) 
for plain Wilson fermions at /3 — 5.6 and k = 0.15825. 

5 Volume dependence of the exact acceptance rate 

We simulate QCD with N{ = 2 flavors of mass-degenerate quarks. The action for 
the gauge field is the Wilson plaquette gauge action [28J 

S g = PS W (U) = ^ ReTr {1 - U(p)} , (5.1) 
v 

where p runs over all oriented plaquettes (i.e., each plaquette is counted with 
two orientations). For the fermions we use the plain Wilson-Dirac operator [28] 
(without clover term and without smearing) with bare quark mass mo, whose 
action on a quark field ip is given by 

(D W (U) + m )iP(x) = (4 + m )i;(x) - 
3 1 

^{U (x, /i)(l - 7m )^(x + jj) + U(x - fa /x)t(l + lfl )^{x -//)}. (5.2) 

fi=0 

The hopping parameter is defined as k = l/(2mo + 8). In this section we simulate 
at parameters (3 = 5.6 and k = 0.15825. Theses values corresponds to a lattice 
spacing a = 0.0717(15) fm [35] and a pseudoscalar mass m PS ps 404 MeV [36] 
(determined on a larger 32 x 24 3 lattice). 

We implement a 4-step PSMS algorithm based on a domain decomposition 
with block size 4 and on a hierarchy of three acceptance- rejection steps. Our code 
is based on the freely available software package DD-HMC by M. Liischer [31] . In 
order to enhance the acceptance rates we introduce parameters as explained in 
Appendix [C] 

In the first step we update the active links in the 4 4 blocks, which amount 
to a fraction of about 9.4% of all links. The gauge proposal consists of 500 iter- 
ations of two Cabibbo-Marinari heat-bath |37j sweeps (with reversed sequence of 
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Figure 4: The variance of the stochastic estimator in the global step from simulations 
of plain Wilson fermions at j3 = 5.6, k = 0.15825 with the 4-step PSMS algorithm. The 
left plot shows the exact variance £3 (black circles) as a function of the lattice volume 
V together with a linear fit (red line). The blue diamond is the result using a 5-step 
PSMS algorithm, see text. The right plot shows the volume dependence of the slope 
s(V) defined in Eq. (15. 6j) together with a linear fit. 

gauge link updates and random choice of SU(2) subgroups) at the shifted coupling 
/3q ^ = 5.6918. The gauge proposal is then subjected to a first acceptance-rejection 
step containing a plaquette action S^? YP like in Eq. (15. ip but where the plaquettes 
are constructed from HYP smeared links with the parameters of [32] (one level 
of smearing). We do one iteration of this step with 95% acceptance. The result- 
ing proposal goes into a second acceptance-rejection step containing the action 
Sb = X] bgC 2 ln(det(-Db)) of the block determinants (one iteration with 76% ac- 
ceptance). We emphasize that the first and second acceptance-rejection (or filter) 
steps are done block-wise and can be therefore parallelized. Finally the gauge pro- 
posal which passed through the first two filter steps enters the global acceptance- 
rejection step with the Schur complement of the 4 4 block decomposition. This is 
a stochastic acceptance-rejection step performed according to Eq. (14.151) using the 
interpolation with intermediate fields Ui, i = 0, . . . , N in Eq. (14.161) . The optimal 
parameters can be tuned following the prescription given in Section EH We note 
that they depend only mildly on the global lattice volume and a representative 
set is listed in Table [TJ 

The global acceptance-rejection probability is P^cl = min {1, exp(— A 3 )}, 
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where (cf. Eq. ( lCl2| ) 



N-l 

A 3 = $ 0) AS W + /3« A^ YP + 4 2) ]T 2A ln(det(A,)) + ^ ^(M/M, - l) Vi (5.3) 

beC i=0 

and Mi is the ratio of Schur complements. On lattices with V = 8 4 up to V = 
16 we measure, for different values of the gauge field interpolation steps N in 
Eq. (I4.13p . the variance 

( x 3 2 (F,iV) = <(A3-(A3)) 2 > l/i ^. (5.4) 

At fixed volume V we extrapolate linearly in 1/N to zero, thus obtaining an esti- 
mate for the exact variance S 2 (V^) as a function of the volume. The justification 
for this extrapolation is given by Eq. (14. 3p . which in this case means 

al(V 1 N) = J:l(V) + {af^(V)) 2 (5.5) 
and by Eq. (14.171) . which implies 

«- h (V)) 2 « i s(V) , (5.6) 

with the slope s(V) is approximately given by Ni h\. Here N\ and h\ refer to 
the Schur complement ratio. Note that a|(V, N) contains also contributions from 
parts of the action other than the Schur complement (cf. Eq. (1C.5P ) but which 
do not depend on N. The extrapolated exact variance Sg(l/) is shown in the left 
plot of Fig. S] as a function of V. The data can be very well fitted by a straight 
line constrained to zero at zero volume (red line). The slopes s(V) of the linear 
fits of cr|(V, N) in 1/N are plotted against the volume V in the right plot of 
Fig. HI The data of the slope can be also well fitted by a straight line constrained 
to zero at zero volume (red line). Assuming that Ni is equal to the dimension 
of the projector P in Eq. (I4.20p and taking into account that the block size is 
here constant and equal to 4 4 , we deduce that Ni oc V. Therefore our results for 
the slope means that the FWHM hi of the generalized eigenvalues of the Schur 
complements does not significantly depend on the volume. 

Via (12. 6p the exact acceptance rate can be determined^] from the variance 
Eg(y). The exact acceptance rates as determined from the variances are plotted 
in Fig. [5] (black circles) together with the result from the fit to S§(V) shown 
in the left plot of Fig. HI The 4-step PSMS algorithm of this section shows a 
good acceptance for lattices up to 16 3 x 8. This is the region where the error 

8 We tested the (tacitly assumed) validity of the Gaussian model for finite values of N. 
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Figure 5: The exact global acceptance is plotted as a function of the variance S 2 
for simulations of plain Wilson fermions at /3 = 5.6, k = 0.15825 with the 4-step 
PSMS algorithm (black circles). The point corresponding to the blue diamond is from 
a simulation of a 16 4 lattice with a 5-step PSMS algorithm. 

function can be approximated by a Taylor expansion with a linear leading term 
erfc(x) = 1 — 2x/y/n + 0(x 3 ). Fig. [9] shows that the acceptance rates, which 
one would obtain from the Schur complement alone (blue diamonds), are much 
smaller. 

The efficiency of the hierarchy of filters in enhancing the acceptance of the 
global step can be demonstrated by simulating the largest 16 4 lattice using a 5- 
step PSMS algorithm. For this we use a recursive domain decomposition of the 
16 4 lattice in 8 4 and 4 4 blocks, cf. Eq. (13. 3p . The additional filter with respect to 
the 4-step PSMS algorithm is a stochastic acceptance-rejection step accounting 
for the Schur complements of the 4 4 blocks within the 8 blocks, cf. Eq. (13. 4p . 
The acceptance of the global step (accounting for the global Schur complement) is 
increased by this further filter step, cf. the blue diamond in Fig. [51 Using recursive 
domain decomposition to keep the largest block size at L/2 (where V = L 4 ), the 
volume dependence of S 2 in the global step is V q (dotted line in the left plot of 
Fig. H]) with q w 0.9 (determined on our available lattices 8 4 and 16 4 ). At large V 
one expects the asymptotic behavior q = 3/4, cf. Section |4~41 
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Table 2: Parameters for the 4-step PSMS algorithm for plain Wilson fermions at 
= 5.8 and k = 0.15462. 

6 Numerical tests of the algorithm 

We present results of simulations of Nf = 2 flavors of mass-degenerate plain Wilson 
fermions on a 16 4 lattice at (3 = 5.8 and k = 0.15462. The clover coefficient is set 
to zero and the fermions have anti-periodic boundary conditions in time direction. 
The lattice spacing is estimated in [35] to be 0.0521(7) fm and the pseudoscalar 
mass is 381 MeV [36] (determined on a larger 64 x 32 3 lattice). 

In the simulations in Section our smallest blocks are 4 4 and the gauge 
proposal changes the active links in these blocks. It turns out that larger blocks 
are better in terms of changing the topological charge and allow for higher global 
acceptances (at somewhat higher computational cost). That is why we change our 
setup in this section and use 8 4 blocks as our smallest ones. The gauge proposal 
changes the active links in a 6 4 hypercube inside each of the 8 4 blocks, which 
amounts to updating 7.9% of all gauge links. 

We adopt a 4-step PSMS algorithm whose parameters and acceptances are 
summarized in Table |2j For each of the steps i = 0,1, 2, 3, n, is the number of 
iterations per step and Ni is the number of gauge field interpolation steps (for 
stochastic estimates of Schur complement ratios). The gauge proposal consists 
of a number n of iterations of symmetrized sweeps of Cabibbo-Marinari heat- 
bath [37] and over-relaxation [3E1E9] updates. One iteration consists of one heat- 
bath (HB) sweep and L/2 over-relaxation (OR) sweeps followed by the reversed 
sequence of link updates (so in total one HB + L/2 OR + L/2 OR + one HB 
sweeps), where for each link we choose with probability 1/2 one sequence of SU(2) 
subgroups and with probability 1/2 the reversed sequence. The first acceptance- 
rejection step is a Metropolis step for a HYP plaquette gauge action which has 
to be subtracted in the successive filter steps. The determinant of the 8 4 blocks 
is factorized by a domain decomposition in 4 4 blocks. The second acceptance- 
rejection step accounts for the exact product of the 4 4 block determinants times the 
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Figure 6: Histogram distribution of the plaquette values from simulations of plain 
Wilson fermions and Wilson plaquette action at (3 = 5.8, n = 0.154620, 16 4 lattices. 
We compare results for the 4-step PSMS algorithm and for the HMC. 

determinant of the Schur complement of the decomposition of the 8 4 blocks in 4 4 
blocks. The latter is treated stochastically. These two acceptance-rejection steps 
are performed independently for each 8 4 block. The third stochastic acceptance- 
rejection step contains the global Schur complement of the decomposition of the 
16 4 lattice in 8 4 blocks. 

In Fig. we show the histogram distribution of the plaquette value. We 
compare the results from 4 replica simulated using the 4-step PSMS algorithm 
(red bins) with the results from a long HMC simulation (white bins). The HMC 
simulation is done with the DD-HMC algorithm [2TJ, |5D] using 8 4 blocks. The 
distributions agree perfectly. 

In the upper two plots of Fig. [7| the histories of the topological charge are 
shown. The topological charge is defined by 



using a discretization of the field strength tensor (see e.g. [30]) in which 

gauge links constructed from three levels of HYP smearing are used. We consider 
4 replica simulated using the 4-step PSMS algorithm (left plot) and 3 replica 
simulated with the HMC (right plot). The horizontal dotted lines are determined 




(6.1) 
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Figure 7: Histories of the topological charge Q (upper plots) and histograms of the 
deviations of the replicum means of Q 2 from the total mean divided by the replicum 
errors (lower plots). The left plots show the results of 4 replica simulated with the 
4-step PSMS algorithm. The right plots show the HMC results from 3 replica. 



from an ad hoc fit to the histogram of the topological charge using 3 Gaussian 
functions (one centered at zero and the other at values ±m corresponding to 
the dotted lines). In order to compare the Monte Carlo histories of the two 
algorithms, we take the Monte Carlo units which correspond to a full change of 
the gauge configuration. To this end, on the x-axis of the history plots we take, 
for the PSMS algorithm, the number of global acceptance steps multiplied by the 
fraction R of links changed and by the global acceptance while, for the HMC 
algorithm, we take the number of trajectories multiplied by the ratio R of active 
links and by the acceptance. The right plot shows that the long HMC replicum 
was not able to really tunnel to a topological sector different than zero, while such 
a tunneling occurred at least once for all PSMS replica. Indeed we compared the 
distributions of the topological charge squared Q 2 for the PSMS replica and the 
long HMC replicum and found that they agree well around Q 2 = but differ at 
larger values. Therefore we started two more HMC replica from configurations 
with topological charge different than zero (generated in the PSMS ensembles), 
which are also shown in Fig. [71 In one of these two additional replica we observe 
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Figure 8: The variance a 2 (V, N) (left plot) in the global step for simulations of plain 
Wilson fermions and Wilson plaquette action at (3 = 5.8, k = 0.154620. Data for 
V = 16 4 (circles) and V = 32 x 16 3 (diamonds) are very well fitted as functions of 1/N 
using a global linear fit (red lines). In the right plot we show the resulting acceptances. 

a clear tunneling from topological sector zero to nonzero. In the lower two plots 
of Fig. [7J we show histograms of the deviations of the replicum means of Q 2 from 
the total mean divided by the replicum errors (the quantity in Eq. (30) of [4~T] ; 
left plot, PSMS; right plot, HMC). The goodness of the replica distribution is 
measured by the probability (goodness-of-fit) of a constant fit to the replicum 
means. The goodness is 0.7 for the PSMS algorithm and 0.05 for the HMC. A 
value much below 0.1 is very unlikely. The expecation value (Q 2 ) is 0.37(15) for 
the 4-step PSMS algorithm and 0.281(81) for the HMC algorithm. The errors 
are determined using the method of [H]. From leading order chiral perturbation 
theory we expect (Q 2 ) ~ 0.19. We emphasize that Fig. [7] is a comparison made at 
one lattice spacing only. The main problem is the scaling with the lattice spacing 
which we cannot address in the scope of this paper. 

In Fig. E] we plot the variance cr 2 (V, N) (left plot) and the acceptance (accord- 
ing to Eq. (I2.6p . right plot) in the global acceptance-rejection step (see Eq. (15. 5p ) 
as a function of 1/N and N respectively. Together with the data for the 16 4 lat- 
tices we present data for 32 x 16 3 lattices. Motivated by the results of Section [5] 
we perform a global fit to the variances of the form 

°l(V,N) = V (ai + jf) (6.2) 

with fit parameters a\ and ci2- The exact variance turns out to be Y* 2 = 0.50(2) 
and E| = 1.00(4) for the 16 4 and 32 x 16 3 lattices respectively. This corresponds 
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to acceptances 0.724(5) and 0.617(7). 

A cost comparison of the simulations of the 16 4 lattice can be performed by 
comparing the number of full inversions of the Wilson-Dirac operator needed to 
update all the links. Using the 4-step PSMS algorithm at optimal parameters, the 
global acceptance-rejection step has N = 96 gauge field interpolation steps (each 
of which requires one inversion of the full operator for the inversion of the global 
Schur complement, see Eq. ( 14.19P ) and 64% acceptance. This means that we need 
~ 23 global steps or 2200 inversions to get a new gauge configuration. If we 
instead run the 4-step PSMS algorithm with N = 24 and 42% global acceptance, 
one new gauge configuration is obtained after w 35 global steps or 840 inversions. 
The DD-HMC needs only 120 inversions for one new gauge configuration. This 
naive cost comparison does not take into account effects of autocorrelation times, 
which are hard to estimate for observables like the topological charge. 

7 Conclusions 

We have developed and tested the PSMS algorithm for lattice QCD that consists 
of a hierarchical filter of acceptance-rejection steps. The hierarchy is based on 
an exact factorization of the fermion determinant. Although other factorization 
are possible, we here deploy (recursive) domain decomposition as it separates the 
determinant in a local (blocks) and global part (Schur complement). 

We were able to determine the exact global acceptance rates for volumes up to 
(1.2 fm) 4 and demonstrate that the filter is successful in fighting the exponential 
decrease with the volume. 

The global acceptance-rejection step with the Schur complement remains ex- 
pensive. We estimate a factor of ten in comparison with the HMC for the setup 
of Section |6j The expected scaling of the cost of the algorithm with the volume is 

V (inversion) x \/ 3/4 (iV) x 1/ (acceptance) . 

The first factor is due to the cost of one inversion of the Dirac operator and the 
second factor arises from the necessity to keep the stochastic noise low. A constant 
global acceptance is achieved for constant variance S 2 of the action differences that 
go into the global step, i.e., a\ oc 1/V is needed (cf. Eq. ( 12. lip ). Instead we find 
erf ~ const as V is increased (cf. Fig. HI). Previously the fluctuations of the 

small eigenvalues of V D^D have been found to decrease like 1/V [26]. We do not 
seem to see this behavior for <j\. The reason might be that our separation scale, 
given by the inverse block size 1/Lb, is too large. For the simulations at (3 = 5.8 
((3 = 5.6) with a block size of 8 this scale is approximately 500 MeV (360 MeV). 
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At the moment the performance of the PSMS algorithm is worse than the one 
of the HMC algorithm, but the scaling of autocorrelation times of the topological 
charge with the lattice spacing has to be studied to make a definite conclusion. In 
Fig. [7] we present evidence that the PSMS algorithm is more efficient in sampling 
the topological sectors compared to the HMC. It is still relevant to study alter- 
natives to the HMC and there are prospects of using and improving the PSMS 
algorithm. One possibility is to apply reweighting for the Schur complement, 
cf. [12] where we demonstrate that reweighting factors for the Schur complement 
have a better scaling with the volume compared to the full operator. Improved 
gauge actions can be included in the hierarchy of acceptance steps and there is 
room for better choices of the gauge updates within the blocks. Also factorizations 
of the determinant other than domain-decomposition could be used. 

The techniques for the stochastic estimation of determinant ratios, which we 
introduced in this article for the acceptance-rejection steps, can be equally well 
applied to the case of reweighting, e.g., in the quark mass jl2] or to account for 
electromagnetic effects. 

Acknowledgement. We thank Tony Kennedy for correspondence on the 
proof of detailed balance, Martin Liischer for a clarification on the fluctuations of 
small eigenvalues of the Dirac operator, Rainer Sommer for discussions and Ulli 
Wolff for comments on the relative gauge fixing. The Monte Carlo simulations 
were carried out on the cluster Stromboli at the University of Wuppertal and we 
thank the University. 

A Proofs of detailed balance 

A.l Exact acceptance-rejection steps 

The simplest setup of our algorithm is to split up the gauge weight in Eq. (I2.7P from 
the fermionic one. The idea is to propose a new gauge configuration U' by a pure 
gauge updating algorithm and accept or reject it by a Metropolis step accounting 
for the fermionic weight. Let T (U — > U') be the transition probability for the 
pure gauge proposal which has to satisfy detailed balance for the distribution (see 
below) 

P o(U ) = «P W» , (A.l) 

^0 

where Z Q is the partition function for the gauge action S g . The Metropolis 
acceptance-rejection step [23] consists of accepting or rejecting the proposal U' 
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with probability 

P-W^nrfnfl,*™}. (A,) 
The transition probability for this algorithm is 

T(U £/') = T (U U')P, CC (U, U') 

+S(U - If) ^1 - y £>[[/"] r (C7 t/")^acc(f/, 17")) • (A.3) 

In order for T to satisfy detailed balance for the distribution P in Eq. ( 12. 7p . T 
has to satisfy detailed balance for the distribution P in Eq. ( lA.ip . If the gauge 
proposal is a sequence of gauge link updates, their order has to be symmetrized 
or chosen randomly [2D] . 



A.2 Stochastic acceptance-rejection steps 

The exact calculation of the determinant ratio in Eq. (1A.2|) is numerically pro- 
hibitive. It can be replaced by a stochastic approximation that maintains detailed 
balance exactly [T3] . 

We follow closely Appendix A, in particular section A. 5, of [22]. The vari- 
ables of the system (the gauge field) are enlarged by adding auxiliary stochastic 
variables, which are called pseudofermions and are only used in the stochastic 
acceptance-rejection step. The equilibrium probability distribution for the en- 
larged system of gauge field U and pseudofermion r\ is 

. W) = «n^KP)) (A . 4) 

The pseudofermion is a complex-valued field 77 with the measure 

jjfo] = TT dRe (V*,<*) dlm (Vx,a) ( A5 ) 
11 n 

x,a 

where the index a contains spin and color degrees of freedom. The norm squared 
of r) is defined by the scalar product (77,77): 



\V\ 



|2 



{V,V) = ^2Vx,aVx ! a- (A.6) 



The equilibrium distribution of the gauge field alone is recovered by integrating 
over the pseudofermion: 

P(U)= [ D[r})P(r],U) . (A.7) 
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We also define the conditional probability 

P(n U) e-murM 2 

mu) - - www (A - 8) 

to generate the pseudofermion field r\ given the gauge field U. 

The algorithm to update the enlarged system consists of alternating two 
Markov steps. The first is a global heatbath step for updating the pseudofermion 
at given gauge field U. A new pseudofermion rj distributed according to P(rj\U) 
in Eq. fl A. 8j) is generated through 

V = D{U)£, (A.9) 

where £ is a Gaussian random pseudofermion generated with probability distribu- 
tion = exp(— 1£| 2 ) 0. The second step is a Metropolis step for the gauge field 
at given pseudofermion. A new gauge field U' is proposed with transition proba- 
bility Tq(U — > U'), which satisfies detailed balance for the distribution Pq{U) in 
Eq. f lA.ip . The proposal is followed by an acceptance- rejection step with proba- 
bility 



P (U)P( V ,U> 



-\D(U')\ 2 V 



min { 1, A/ — - \^ } = min <j 1, _ ]D{u)?ri } ■ (A.IO) 



P(r},U)P Q (U> 



Both the heatbath and Metropolis steps separately fulfill detailed balance with 
respect to the combined probability distribution P(r), U) in Eq. flA.4j) [20J. There- 
fore also their composition has the correct fixed point probability |43j . 

We consider now a composite update step consisting of an heatbath update 
for the pseudofermion in Eq. (IA.9j) immediately followed by a Metropolis step for 
the gauge field in Eq. f ]A.10j) . If after this we forget the pseudofermion field, this 
can be viewed as an update for the gauge field alone with acceptance probability^ 



P m {U, U') = J D[n\ P(v\U) min J 1, 



P (U)P(r],U') 



P(t],U)Po(U>) 

= J D[$e-Wmm{l,e-\ M ^ 2 } , (A.ll) 

where the ratio operator M is defined in Eq. f)2.10p . The associated transition 
probability in Eq. flA.3j) . where now P acc (U,U') is given by Eq. (lA.llj) . satisfies 
detailed balance for the equilibrium probability P{U) due to the property [22] 

[P(U)/P (U)} P acc (U, U') = [P(U')/P (U')} P acc (U', U) , (A.12) 



9 The pseudofermion measure in Eq. (|A.5[) is normalized such that J -D[C]p(0 = 1. 
10 We thank Tony Kennedy for clarifying this point in a correspondence. 
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or equivalently [13J 

Wwr^ mr '- (A - 13) 

In practice, the acceptance step Eq. flA.llj) is done by drawing one Gaussian 
distributed pseudofermion £ and accepting or rejecting depending on the argument 
under the min function. We note that it is not possible to perform the average of 
the argument under the min function over many pseudofermions, as this violates 
detailed balance. The acceptance probability in Eq. f lA.llj) was computed in [13] 



PUU, U') = nun(l, 1/AO ]J l^TT" (A.14) 

... Aj Aj 

in terms of the eigenvalues Aj of M^M. It is bounded by the exact (non-stochastic) 
acceptance probability in Eq. f|A.2[) [13] 

Pacc(U,U') <min{l,|det(M)|- 2 } . (A.15) 

So far we discussed the case of a proposal followed by an acceptance-rejection 
steps. Eq. (I A . 2 [) can be generalized to an arbitrary number of acceptance-rejection 
steps as discussed in Section [2J The algorithm satisfies detailed balance and this 
is also true if (some of) the Metropolis acceptance-rejections steps are replaced 
by their stochastic counterpart Eq. (lA.llj) . 

A. 3 Gauge field interpolation 

In order to simplify a bit the notation we consider an algorithm like it is described 
in Section IA.2I with one stochastic acceptance-rejection step. In practice we ap- 
ply the gauge field interpolation method to acceptance-rejection steps involving 
the Schur complements (the global Schur complement D as well as the Schur 
complement in the blocks Db when we use recursive domain decomposition, see 
Eq. d33J). 

For the gauge proposal U — > U' we consider the gauge field interpolation [/, as 
it is given in Eq. f )4.16p . For each of the transitions Ui — > U i+ %, i = 0, 1, - ■ • ,N — 1 
we introduce a pseudofermion field r^. The equilibrium probability distribution 
for the enlarged system is 

\ D{U9) -i m \2 Sg (u) o-iz^r 1 ^ 2 

m*},u,m = - z U ]iet{D(UiW (a.16) 
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and depends now also on the proposed configuration U' . Integrating over the 
pseudofermions gives 

» N-l 

P(U)= / n D[th] P({Vj}, U, U') . (A.17) 

The conditional probability to generate the pseudofermions {rjj} given the pro- 
posal U —>■[/' is 

We use the property det (£>([/)) = det(D(f/ 9 )). 

If we consider the reversed gauge proposal U' —> U (i.e. U = U' 9 and 
Un = U 9 ), the intermediate configurations C/j in Eq. ( I4.16P are the same but they 
are traversed in reversed order and therefore the pseudofermion 77, is associated 
with the transition Ui+i — > Ui. The probability distribution for the enlarged 
system is now 

IDtf/'^V 1 ^! 2 -S g (U') e -\D{U t+1 )-^\ 2 

PlM,v,v) = , n , demu , +lW < A - i9 > 



i=0 



and the conditional probability to generate the pseudofermions {r/j} is 



P(in\ U' U) ^ e -\D{u l+1 )-^i\ 2 

p 'w^^»=^^=n iL(z) M - (A - 20) 

The acceptance probability for the gauge proposal U — > U' is 

r N-l 

= / n^fe] eH&|2min { i ' eEf =° 1HM ^ |2+i ° |2 } > (a. 21) 

^ i=0 

where 

M i = D(U i+1 )- 1 D{U i ). (A.22) 

P acc ([7, C/') in Eq. (IA.21I) fulfills the detailed balance condition Eq. (IA.12j) or equiv- 
alent ly 

h " ' ] det(M)|" 2 , M = D(U'y 1 D(U) . (A.23) 



Pa,cc(U, U_) _ 1 iXiA . , , ,|_2 1 f TM\ rl -\ 

P acc (U',U) 
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In practice, the global acceptance step Eq. (1A.21I) is done by drawing N Gaussian 
distributed pseudofermions & and accepting or rejecting depending on the argu- 
ment of the min function (i.e. we evaluate the sum in the exponent under the min 
function) . 



B Relative gauge fixing 



The relative gauge fixing of two gauge field configurations U and U' is done using 
a steepest descent scheme introduced by jl4"tl4"5] f° r gauge group SU(3). Using the 
condition Eq. (I4.12p for the gauge transformation g, the minimization condition 
in Eq. ( 14. 9 p can be written similarly to the case of the Landau gauge condition. 
Then one can apply the procedure of [IS] to fix the relative gauge. 

At each point x where the gauge transformation g is defined we have to solve 
the condition 



where 



minReTr {l - {g{x) ] f ■ {W { {x) + W h {x)} 

9{x) 



Wt{x) = J2 U '( x >^9(x + il) 2 U\x^), 
ft 

W h (x) = ^ U'\x - p,, fi)g(x - fi) 2 U(x -#,//)■ 



(B.l) 

(B.2) 
(B.3) 



Using the steepest descent method of [45J we get the minimizing transformation 
field g(x) iteratively through 



9\ x ) 



a 

<>X1>i _ 2 



A - A f - ^Tr (A - A 1 ") 



with a scaling parameter a and 

A = W f (x) + W h (x) 
The minimum is reached when 9{x) = 0, where 



9{x) 



Tr 



A- A t --Tr(A- A f ) 



(B.4) 



(B.5) 



(B.6) 



We choose the value a = 0.15. If there is no convergence we reduce it in steps 
of —0.01 and reach values down to a = 0.10. For the SU(3) exponential function 
in Eq. ( 1B.4I) we use the matrix function described in Appendix A of [2T]. The 
numerical cost of the relative gauge fixing can be reduced in the case of a domain 
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Figure 9: The exact global acceptance is plotted as a function of the variance S| 
for simulations of plain Wilson fermions at j3 = 5.6, k = 0.15825 with the 4-step 
PSMS algorithm. The black circles are the same as in Fig. [5] and corresponds to the 
optimal acceptance. The blue diamonds represent the exact acceptance which one would 
get from the Schur complement alone without the additional parameters (last row in 
Tabled]). 



decomposition by denning g(x) only inside the blocks where the active links are 
changed. Further it can be reduced by stopping the iteration when 8(x) < 10 -3 , 
which we find good enough for the purpose of the gauge field interpolation dis- 
cussed in Section 14.31 

C Parametrized acceptance-rejection steps 

The general idea of the PSMS algorithm is to factorize the distribution in Eq. ( 12. 7p 
in several pieces and introduce a recursive update procedure with a computational 
cost ordering. Naively speaking a gauge configuration is proposed by a pure gauge 
algorithm and the fermion determinant is treated in acceptance-rejection steps. 
It is easy to see that the plaquette gauge action and the determinant of the Dirac 
operator are strongly correlated. This correlation can be used to increase the 
acceptance [T3" | IT4" t l2"4"] and also in the case of reweighting [32] • This is an example 
of ultraviolet filtering [201 1471118] . 
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C.l Optimization of the acceptance 

We consider the hierarchy of acceptance-rejection steps i — 1,2, ... ,n in Eq. (12. 3|) . 
The factorization (I2.2p is not unique and can be parametrized. In each acceptance- 
rejection step the action involved might be written as 

i 

S i = J2$ )s{j) , i = l,...,n, (C.l) 

3=0 

with real coefficients and actions S^. The probability to accept the proposal 
for a new gauge field U' starting from U is min {1, exp(— A,)} where 

i 

A, = Si(U') - Si(U) = ^/^AS^) . (C.2) 

j=0 

For Gaussian distributed Aj the acceptance rate is given by erfc with 

£ 2 = ((A, - (A,)) 2 ) . (C.3) 

In the case of the factorization Eq. ( 13. 2j) , n = 2 and is the gauge action, 
= £ 6gC 21n(det(.D 6 )) and S (2) = 21n(det(L>)) are the effective actions of the 
determinants of the blocks and of the Schur complement respectively. In stochastic 
acceptance-rejection steps, like we do for the Schur complement, we use 

AS (2) = 7? t (M t M- 1)77 , (C.4) 

where 77 is a Gaussian noise vector and M = -D(£/') -1 .D(£/). In such case the 
variance S 2 in Eq. (1C.3j) is replaced by the sum of the exact variance and the 



stochastic variance according to Eq. ( 14. 3ft . The variance in Eq. ( 1C.3I) can be 
written explicitly in terms of the coefficients as 

S 2 = (V?) 2 C m + w , (C.5) 

j=o j,fc=o 

where = ((AS^ - (AS^))(A£W - (AS^))). We note that here (•) 

means an average over configurations U in the dynamical ensemble and over gauge 
proposals U' (and over noise r] if applicable). 

At each step % = l,...,n the optimization of the parameters (3^ is done 
by minimizing the variance S 2 in Eq. fl0.3[) . The idea is to use the correlation 
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of low cost actions with the high cost action of the ith step to increase the ith- 
level acceptance rate. In order to get the right distribution after the last step the 
parameters of a specific action has to sum up to the target value f3^ = ^ pf . 
This implies constraints on the parameter. For example the parameter of the 
plaquette action has to sum up to (3^ = (3. In principle, in order to solve for the 
parameters, we can start from the last step i = n, solve for the parameters fllP 
and go to step i — 1. This provides an explicit solution scheme. At step i, we solve 
the linear system of i equations 

i-l 

2C^)(3^ + C {jk ^\ k) = -CWp? , j = 0, . . . , % - 1 , (C.6) 

k=0 

to uniquely determine the values of the coefficients /?} , . . . , 1 . The solutions 
of the steps k > i and one constraint imply = /3® — Y^k=i+i&k ■ 

We emphasize some properties of the parametrized acceptance-rejection steps. 
First of all this quite simple technique guarantees that the distribution of the 
pure gauge proposal has a good overlap with the dynamical distribution. With- 
out parametrization the acceptance rate for lattices bigger than 4 4 would be less 
than few %. The parametrization introduces a /3-shift to higher (3 values in the 
pure gauge update, mainly reflecting the correlation with the determinants on 
the small blocks, see Table [1] and Table [2j In general it is possible to intro- 
duce a new acceptance-rejection step i by defining an auxiliary action with addi- 
tional parameters. These parameters have to sum up to zero when considering all 
acceptance-rejection steps k > i. Their effect is to enhance the acceptance rate 
of these steps. For example we introduced a plaquette action, which uses HYP 
smeared links (one level of smearing) in order to better match the pure gauge 
update with the fermionic weight. This is particularly motivated for simulations 
with HYP smeared Wilson fermions but also helps for plain Wilson fermions, see 
Table [Hand Table [2J We remark that it is not possible to introduce parameters for 
terms which are evaluated stochastically like Eq. (IC.4j) . The effectiveness of the 



parametrization of acceptance-rejection steps is demonstrated in Fig. [91 where we 
compare the exact acceptances in the global step with optimal parameters (black 
circles) to the exact acceptances without parameters (blue diamonds). 



C.2 Tuning the optimal parameters 

The parameters in the acceptance- rejection steps i = 1,2, . . . ,n — 1 are estimated 
from a simulation where the global step i = n (the computationally most costly) 
is left out. Subsequently a full simulation is performed in order to determine the 
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optimal parameters for the global step. Iterating further this procedure does not 
significantly change the values of the parameters. 
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